Abstract: With every second slipping away, there are thousands of activities carried out on the internet. The big data generated out of logs is very crucial and important to provide a superior backbone to the whole IT infrastructure. It is not an easy task to analyse the logs generated by system as there are numbers of discrete type machines available in a network. The growing size and complexity of log files has made the manual analysis of system logs by administrators prohibitive. This fact makes it important for tools and techniques that will allow some form of automation in management and analysis of system logs to be developed. The proposed technique uses Hadoop framework to process huge amount of data. To extract useful information data mining technique is used, among which clustering is most popular. Various types of clustering like Connectivity-based clustering, Centroid-based clustering (K-Means Clustering), Density-based clustering, etc. can be used to cluster the log files. Centroid-based (K-Means Clustering) is the most effective technique amongst all. The input logs will be clustered using K-Means clustering and then processed in Hadoop framework. The analysis result can be used for detecting security threats in network and inform the administrator about the Intrusion Detection, SQL-Injection, system error, etc.
Keywords: Big Data, Hadoop, Log Analysis.